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Sony’s Emotionally Charged Chip 


Killer Floating-Point “Emotion Engine” To Power PlayStation 2000 


by Keith D iefendorff 


While Intel and the PC industry stumble around in 
search of some need for the processing power they already 
have, Sony has been busy trying to figure out how to get more 
of it— lots more. The company has apparently succeeded: at 
the recent International Solid-State Circuits Conference (see 
MPR 4/19/99, p. 20), Sony Computer Entertainment (SCE) 
and Toshiba described a multimedia processor that will be the 
heart of the next-generation PlayStation, which— lacking an 
official name— we refer to as PlayStation 2000, or PSX 2. 

Called the Emotion Engine (EE), the new chip upsets 
the traditional notion of a game processor. Whereas game 
CPUs have typically been cheap and wimpy compared with 
those in PCs, the EE is neither. At a whopping 240 mm? in a 
0.25-micron process, the 10.5-million-transistor chip will cost 
morethan $100 to manufacture, according to our cost model. 
Never mind the companion 279-mm? rendering chip, called 
the graphics synthesizer (GS), or the I/O processor (IOP), 
which includes a complete first-generation PlayStation CPU 
for backward compatibility, as Figure 1 shows. 

The EE and GS die sizes are frightening; vendors of PC 
processors break out in a cold sweat at the mere thought of a 
dielarger than about 180 mm?. H ow Toshiba and SCE intend 
to build two chips larger than that for a consumer game con- 
soleis unclear. But the companies areintent on doing so; two 
large fabs are now being readied for just this purpose. 

While the EE is not cheap, neither is it wimpy. The 
300-M Hz part packs a floating-point punch of 6.2 GFLOPS, 
three times that of Intel’s top-of-the-line 500-M H z Pen- 
tium III with SSE (see M PR 3/8/99, p. 1) and 15 times that of 
a Celeron-400 (which lacks SSE). With the EE pumping out 
75 million polygons per second and theGS drawing polygons 
at 2.4 billion pixels per second, the PlayStation 2000 will 
bring Toy Story-like realism to home games, says SCE. 


PlayStation Rules 
Since 1994, when it was first introduced, the PlayStation has 
amassed sales of 54 million units and has now reached arun 


rate of two million units per month, making it the most suc- 
cessful single product (in units) Sony has ever built. 

Although SCE has cornered more than 60% of the 
$6 billion game-console market, it was beginning to feel the 
heat from Sega's Dreamcast (see M PR 6/1/98, p. 8), which has 
sold over a million units since its debut last November. With 
a 200-M Hz Hitachi SH-4 and NEC’s PowerVR graphics chip, 
Dreamcast delivers 3 to 10 times as many 3D polygons as 
PlayStation’s 34-M Hz MIPS processor (see M PR 7/11/94, 
p. 9). To maintain king-of-the mountain status, SCE had to 
do something spectacular. And it has: the PSX2 will deliver 
more than 10 times the polygon throughput of Dreamcast, 
leaving it and other competitors in the virtual dust. 

With DVD-ROM, Dolby Digital (AC-3) and Digital 
Theater System (DTS) sound, 32M of memory, a modem, 
|EEE-1394, and USB, the PSX2 system could be more than 
just a game console Able to perform many of the functions 
for which people buy sub-$600 PCs, the PSX2 has the poten- 
tial to swipe a chunk of the low-end market from under the 
noses of PC vendors, x86 vendors, and Microsoft. The PSX2 
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Figure 1. PlayStation 2000 employs an unprecedented level of 


parallelism to achieve workstation-class 3D performance. 
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could also throw a monkey wrench into the plans of dozens 
of Silicon Valley startups (such as VM Labs) working on 
DV D-based home-entertainment gizmos and could cut 
deeply into the market for WebT Vs and similar devices— an 
event we have already forecast (see MPR 6/22/98, p. 3). 


Totally New, But Still Backward Compatible 

On its own merits, the PSX 2 will be compelling enough to 
attract a large following. But to be safe, SCE will lure current 
customers to PSX2 by making it backward compatible with 
PlayStation. This compatibility will, it hopes, prevent the 
Osborne effect and avoid a drop in sales of PlayStation games 
to those anticipating the new platform, which won't arrive 
until 4Q99 in Japan and 3Q00 elsewhere. Lack of compatibil- 
ity prevented previous game-console manufacturers from car- 
rying momentum from one generation to the next. 

SCE takes the brute-force approach to compatibility: it 
will simply include an identical copy of the PlayStation CPU 
in the new platform. So as not to waste silicon, this CPU 
serves as the PSX 2’s1/O processor when running new games, 
switching to the role of central processor to run old games. 
With this approach, the performance and quality of legacy 
games will be the same as on the original PlayStation. 


Emotion Is the Difference 

Although the EE provides conventional polygon- based ren- 
dering, it also supports more computationally complex 
curved surfaces, using NURBS-based (nonuniform rational 
B-splines) models, significantly boosting image quality. But 
much of the EE’s compute power will go toward an even 
loftier goal: behavioral synthesis, or, as SCE calls it, emotion 
synthesis. This technology gives game programmers the 
ability to accurately model all manner of physical systems, 
allowing realistic behavior of characters and objects. For 
example, the system will enable lifelike facial expressions, as 
Figure 2 shows. The digital wind will ruffle hair and clothes. 
Gravity, mass, and friction will influence the motion of 
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Figure 3. The PSX2’s Emotion Engine provides ten floating-point 
multiplier-accumulators, four floating-point dividers, and an 
M PEG-2 decoder to deliver killer multimedia performance. 
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objects. And the properties 
of materials such as water, 
wood, metal, and gas will all 
be accurately simulated. 


Floating Point Key to 
Emotion 

M aking the vision of emo- 
tion synthesis a reality will 
require massive floating- 
point computational horse 
power. To deliver thesecapa- Figure 2. PlayStation 2000 
bilities, the EE provides screenshot. (Source: Namco) 
several autonomous proces- 

sing units, as Figure 3 shows. 

The EE chip itself includes a dual-issue superscalar 
core with 128-bit SIM D-integer capability and a scalar 
floating-point unit. This core is tightly coupled to a vector 
floating-point unit (VPU 9); together the core and VPU o 
run the game code and perform the high-level modeling 
computations. VPU o can also be pressed into service for 
3D-geometry transformations when it is not otherwise 
occupied. 

A second vector floating-point unit, VPU 4, is dedicated 
to 3D geometry and lighting. This unit runs independently, 
in parallel with the CPU, under microcode control. An 
autonomous image-processing unit (IPU ) and a 10-channel 
DMA controller also operate in parallel with the CPU. All 
units pass graphics-display-list entries to the graphics inter- 
face (GIF), which prioritizes requests and passes them to the 
graphics synthesizer for rendering. All units connect through 
an on-chip shared 128-bit bus to a dual-channel Direct Ram- 
bus DRAM (DRDRAM) memory controller. 


MIPS at the Core 
The heart of the Emotion Engine is a superscalar RISC core 
with two 64-bit integer units and a single-precision scalar 
FPU. Although SCE and Toshibain their ISSCC paper said the 
EE would operate at 250 M Hz, SCE says the product will actu- 
ally ship at 300 M Hz. At that speed, the core achieves a Dhry- 
stone 2.1 rating of 436 MIPS using the GNU C compiler. 
Based primarily on the MIPS III (R4000) architecture, 
the core also includes many of the MIPS IV (R5000/R 10000) 
instructions. But instead of the MIPS-standard MDM X 
64-bit SIM D-integer instructions, SCE defined a completely 
new set of 128-bit SIM D-integer instructions. The 107 new 
instructions are implemented by doubling the width of the 
general-purpose registers to 128 bits and ganging together the 
two 64-bit integer units to process 128-bit-wide SIM D oper- 
ands. Together, the two units can perform four 32-bit, eight 
16-bit, or sixteen 8-bit integer arithmetic operations each 
cycle. SIMD instructions include add, subtract, multiply, 
divide, min/max, shift, logical, leading-zero count, 128-bit 
load or store, and 256-bit—>128-bit funnel shift. SCE would 
not describe a few of theinstructions for competitive reasons. 
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A 16K two-way set- associative instruction cache, an 8K 
two-way set- associative data cache, and a 16K (1K x 128-bit) 
scratchpad RAM (SPR) feed the core. The data cache is non- 
blocking, allowing hits to proceed while a miss is being ser- 
viced. Both the data cache and the SPR access in one cycle. 

The SPR is provided to avoid thrashing the cache with 
long streams of continuous video addresses whose data is not 
reused within a video frame. Using the SPR as a double 
buffer, filled by the DMA, prevents video accesses from pol- 
luting the cache, making it more effective for other things. 

A MIPS-compliant combined instruction/data TLB 
with 48 double entries (two physical page numbers selected 
by a low-order virtual-address bit) translates virtual ad- 
dresses to physical addresses. The instruction and data 
caches are both virtually indexed and physically tagged with 
64-byte lines. The SPR is logically in a separate memory 
space, enabled by an S-flag bit in the page-table entries. 

The core can issue two instructions each cycle and, as 
Figure 4 shows, employs a simple in-order six- stage pipeline. 
Branch prediction is performed via a 64-entry branch- 
target-address cache (BTAC), which records the target ad- 
dress of taken branches, and a branch-history table (BHT) 
integrated into the instruction cache (two bits per line). 

The BTAC is accessed in the first stage of the pipeline 
and, on ahit, provides a predicted target address on the next 
cycle. On aBTAC miss, the BHT, which is accessed in the 
instruction-fetch stage, is used to redirect instruction fetch, if 
necessary. Instructions are executed speculatively along the 
predicted path until the actual branch direction is resolved in 
the execute stage. If a branch is mispredicted, speculatively 
executed instruction results are discarded and the pipeline is 
restarted, imposing a three-cycle penalty. 


Vector Units Provide M assive FP Power 

VPU is directly attached to the core by a 128-bit operand 
bus, with each VPU 9 macroinstruction executed as a MIPS 
coprocessor instruction. VPU o macroinstructions (e.g., 
matrix x vector multiply) are sequenced by microcode 
from a 4K SRAM instruction memory (microM EM 9). The 
second vector unit, VPU, runs asynchronously with the 


M ispredict penalty = 3 cycles 
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Figure 4. The EE’s CPU uses a six-stage dual-issue pipeline. The 
vector units can each execute four single- precision floating-point 
multiply-accumulates every cycle, as well as one floating-point 
divide every seven cycles. 
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Vector Unit Divide 


CPU, sequenced completely by microcode from its 16K 
instruction memory (microM EM). Each vector-unit 
microinstruction is atwo-wideVLIW instruction, as Figure 
5 shows. 

Each vector unit has thirty-two 128-bit floating-point 
registers, sixteen 16-bit integer registers, and a local data 
memory. The data memory for VPUg (VUMEM,) is 4K in 
size, whileVUM EM 7 is 16K. Both VU M EM scan be read and 
written by the DMA controller. VUM EM, is connected 
directly to the GIF for rapid transmission of display lists to 
theGS. 

Vector operands consist of four IEEE single precision 
values that are distributed to the vector unit’s four parallel 
multiply-accumulate units (FMACs). Each FMAC hasa 
latency of four cycles and is fully pipelined for a throughput 
of one multiply-accumulate per cycle. The vector units each 
include one FP divider (FDIV) with a throughput of one 
divide (or square root) every seven cycles. V PU ; also includes 
an elementary-function unit (EFU) for performing the 
scalar operations that go with 3D-geometry operations. 


Image Processor Decodes M PEG-2 
With DVD-ROM asthe delivery medium, content providers 
will have from 8 to 28 times the storage capacity of Play- 
Station’s CD-ROMs. This huge increase in capacity allows 
more complex, more sophisticated, and more realistic games, 
and it allows games to use high-quality video and sound. 

Just as important, however, the DVD drive enables the 
playback of DVD movies, opening the potential for the PSX2 
to serve that rolein home entertainment systems. While the 
PSX 2 hardware has the capability to play DVD movies, SCE 
so far has refused to confirm that it will enable that feature. 
Although the company could be trying to avoid cannibaliz- 
ing another Sony product, we expect that it will eventually 
come to the right decision and include the feature. 

To decode the M PEG-2 compressed- video format used 
in DVD movies, the EE provides an autonomous image pro- 
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Figure 5. Each vector unit has enough parallelism to complete a 
vertex operation (19 mul-adds + 1 divide) every seven cycles. 
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cessing unit (IPU ) that operates in parallel with the CPU. 
While decoding a video stream, the DMA controller feeds 
compressed video to the! PU’s input FIFO over the internal 
data bus. The FIFO smooths out the data transfers, prevent- 
ing the PU from stalling while other transactions are on the 
bus. 

TheIPU can perform variable-length decoding (VLD), 
zigzag scanning, inverse quantization (1Q), and inverse- 
discrete-cosine transform (IDCT) operations, decoding 
M PEG-2 macroblocks at a rate of 768 cycles per macroblock 
(2.56 us/block). After macroblock decoding, the IPU per- 
forms color-space conversion (YCbCr to RGB), vector quanti- 
zation, and 4 x 4-ordered dithering. The IPU decompresses 
video at 150 M pixels/s— fast enough even for HDTV. 

In addition to its video-decoding duties, the IPU de 
compresses 3D -texture maps stored in main memory. The 
IPU performs on-demand texture decompression to either a 
high, medium, or low resolution, depending on the require 
ments of the image being rendered. The DMA controller 
transfers the decompressed video or texture data from the 
IFU's output FIFO over the internal data bus to the GIF. 


Bandwidth Not Ignored 

The internal data bus— the backbone connecting all of the 
EE’s processing units to main memory and external periph- 
erals— is 128 bits wide and operates at half of the CPU clock 
frequency. At a 300-MHz CPU speed, the bus provides a 
peak bandwidth of 2.4 GBytes/s. DMA transfers over this 


bus occur in packets of eight 128-bit words, minimizing the 
protocol overhead. The bus achieves an efficiency of about 
85%, leaving an effective bandwidth of about 2 GBytes/s. 

Themain-memory controller connects two channels of 
DRDRAM to the internal data bus. Although the two chan- 
nelstogether can supply data at 3.2 GBytes/s, 33% faster than 
the internal bus can take it, buffering in the memory con- 
troller can make use of the extra bandwidth. 

I/O interface circuits on the EE connect the internal 
data bus to an external 32-bit 1/0 bus, which transfers data 
between the EE and the I/O processor (IOP) at 37.5 MHz, 
one-eighth of the CPU frequency, as Table 1 shows. Al- 
though the bus could easily run faster, 150 M Bytes/s of 
bandwidth is more than enough to carry all the 1/0 traffic 
from the DVD drive (1.5 MBytes/s), USB (up to 
12 Mbits/s), IEEE- 1394 (up to 400 M bits/s), sound sam- 
ples (up to 1.5 M bytes/s), and a 56-K bit/s V.90 modem. 
SCE will offer the modem on aPC Card (PCMCIA) in 
order to retain flexibility as communications technology 
evolves and also to deal with different regulationsin differ- 
ent countries. 

SCE has said little about the IOP other than that it is 
being designed jointly with LSI Logic and that its 32-bit 
MIPS-architecture core will be identical to the current 
PlayStation CPU, except for some minor enhancements to 
the cache and a 4x increasein DM A transfer rates. To achieve 
strict backward compatibility, the CPU will operate at 
33.8 MHz, just like the original. 


Emotion Engine 
Frequency 
CPU Core: 
Registers 
M icroarchitecture 
CPU Pipeline 
Instruction Cache 
Data Cache 
Scratchpad RAM 
TLBs 
Vector Unit 0: 
Memory 
Vector Unit 1: 
Memory 
Image Processing Unit 
DMA 
On-Chip Bus Bandwidth 
Main Memory: 
Bandwidth 
Performance: 
Floating-Point Peak 
Perspective Transform 
With Lighting & Fog 
Bezier Surface Patches 
Image Decompression 
Process: 
Size 
Power 
Package 


Description 

300 MHz 

MIPS III, MIPS IV subset + 128b SIM D 
32 x 128-bit 

2-issue, two 64-bit integer units, 1 FPU 
6 stages 

16K, two-way set-associative 

8K, two-way set-associative 

16K 

48-entry combined instruction/ data TLB 
4 FM ACs, 1 FDIV 

4K instruction, 4K data 

5 FMACs, 2 FDIV 

16K instruction, 16K data 

M PEG-2 macroblock decoder 

10 channels 

2.4 MB/s peak, 2.0 M B/s effective 
32M, two DRDRAM channels 

3.2 GBytes/s peak 


6.2 GFLOPS 

66 M polygons/s 

36 M polygons/s 

16 M polygons/s 

150 M pixels/s 

0.25 um (0.18 um L,), 4-layer-metal 
240 mm?, 10.5 million transistors 
15 W at1.8Vv 

540-contact PBGA 


Frequency 

Pixel Processing 

Display List Bandwidth 

Video Memory: 
Bandwidth 

Pixel Format 

Rendering Performance: 
48-Pix Quad w/Z,A 
48-Pix Quad w/Z,A,T 
8 x 8-Pixel Sprites 
Particles 

Output 

Process: 
Size 

Package 

1/0 Processor 

Frequency 

CPU: 
Compatibility 
Characteristics 

1/0 Bus to EE 

Local-Bus Devices 

IEEE-1394 

USB 

SPU 2 Sound Chip 

Voices 

Sampling Rates 


Description 

150 MHz 

16 parallel processors 

1.2 GBytes/s 

4M multiported embedded DRAM 
48 GBytes/s (2,560-bit bus) 

32-bit RGB a, 32-bit Z 

75 M polygons/s peak 

50 M polygons/ s (2.4 Gpixels/s) 

25 M polygons/s (1.2 Gpixels/s) 
18.75 million/s 

150 million/s 

NTSC, PAL, VESA (1,280 x 1,024 max) 
0.25 um (0.25 um Ly), 4-layer-metal 
279 mm?, 42.7 million transistors 
384-contact BGA 


33.8 MHz or 37.5 M Hz selectable 
MIPS (R3000 based) 

PlayStation 

Enhanced cache, 4x PlayStation DMA 
150 M Bytes/s, 32-bit bus 

DVD-ROM, PC Card, SPU2 sound chip 
100-400 M bits/s 

1.5-12 M bits/s 


48-channel + software voices on CPU 
44.1 KHz or 48 KHz selectable 


Table 1. The PlayStation 2000’s four main chips offer the most impressive list of features ever in a consumer game console. (Source: SCE) 
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Vector Units Produce 66 Million Polygons/s 
In both vector units, the latency of the divider is balanced 
with the throughput of the FMAC units so that, with three 
stage software pipelining, each vector unit can achieve a 
throughput of one complete 3D- vertex operation (19 mul- 
adds + 1 divide) every seven cycles. At 300 MHz, with both 
vector units blazing, the EE can transform 66 M polygons/s. 
In contrast, a Pentium 111-500 generates only 4 M polygons/s. 
With lighting and fog effects applied, the EE’s polygon 
throughput only drops to 36 M polygons/s, 18x the rate 
attained by a Pentium 111-500. Even bandwidth-eating Bezier 
surface patches can be generated at 16 M polygons/s; as 
Table 2 shows, most other processors don’t even handle 
Bezier patches, or do them too slowly to be useful. 


GS Chip Guzzles 75 Million Polygons/s 

The polygon display lists created by the EE’s vector units are 
transferred by the GIF to the GS over a dedicated 64-bit bus 
at 150 MHz. At this clock rate, the graphics bus provides 
1.2 GBytes/s of display-list bandwidth, more than twice that 
of 2x AGP and more even than 4x AGP. 

The polygon-rendering ability of any graphics system is 
a strong function of the bandwidth between the pixel proces- 
sor(s) and video memory. Not satisfied with the puny band- 
width that conventional 3D-rendering chips achieve with dis- 
crete SDRAM , SGRAM „or RDRAM memory, SCE integrated 
the video memory onto the same chip as the pixel engines. 
With this organization, the GS provides a monstrous 
48 GBytes/s of bandwidth, 30 times more than is provided by 
the discrete 128-bit 200-M Hz SGRAM video memory used 
on many high-end graphics systems today. 

TheGS's 4M of embedded multiport DRAM hasa data- 
bus width of 2,560 bits, allowing simultaneous 1,024-bit 
video reads, 1,024-bit video writes, and 512-bit texture reads. 
TheGS uses a 32-bit RGBa pixel format with a 32-bit Z value 
and supports the full gamut of rendering functions, including 
texture mapping, bump mapping, fogging, alpha blending, 
bi- and trilinear filtering, MIP mapping, antialiasing, and 
multipass rendering. The 4M memory is adequate to double 


Playstation 2000 
3D-Geometry Performance 


Playstation 


Nintendo 64 


buffer an NTSC frame with full 32-bit color and Z buffering. 
The graphics synthesizer output supports NTSC, PAL, DTV, 
and VESA (1,280 x 1,024 maximum) formats. 


Caution: 3D-Performance Claims Follow 

Given the GS's extraordinary video-memory bandwidth and 
its 16 parallel pixel-processing engines, SCE boasts of a peak 
rendering rate of 75 Mpolygons/s and a pixel fill rate of 
2.4 Gpixels/s for Gouraud-shaded, Z-buffered, and alpha- 
blended polygons. Adding texture cuts the fill rate in half to 
1.2 Gpixels/s, or 25 million 48-pixel quadrilaterals per sec- 
ond. The PSX2 can also draw 19 million sprites/s (8 x 8 pix- 
els) and 150 million particles/s (for smoke and spark effects). 

PSX 2's fully Z-buffered, alpha-blended, and textured 
drawing rate of 1.2 Gpixels/s is higher than those of even 
the newest round of PC 3D-game hardware (see M PR 4/19/99, 
p. 17). Voodoo3 from 3Dfx, for example, ddivers 183 M pixes/s, 
while Nvidia's RIVA TNT2 achieves 350 M pixels/s. PSX 2 also 
outpaces many graphics workstations, such as SGI’s newest 
500-M Hz Pentium IIl-based NT workstation with a Cobalt 
graphics chip (the 320), which delivers about 150 M pixays. 

The 3D-graphics throughput of many systems, however, 
is held below their maximum rendering rates by the limited 
polygon-producing abilities of theCPU and by the bandwidth 
available to deliver polygon display lists to the rendering 
engine. By the time PSX2 shipsin volume, Pentium III's speed 
could increase by nearly 50%, to 733 M Hz, and AGP will have 
gone from the 2x to the 4x level (1 GByte/s). But even if these 
speedups fall straight through to drawing rates, which they 
won't, the PSX2 will still be 2-4x faster than the fastest PCs 
and many graphics workstations. 

It should be noted, however, that the polygon rates 
quoted by 3D vendors tend to vastly overstate performance. 
In fact, some game developers have already expressed skepti- 
cism about SCE’s performance claims for PSX 2. M isled by the 
company’s grossly exaggerated claims for the original Play- 
Station— which some claim are as much as a factor of five 
higher than the machine delivers in practice— developers are 
wary. Although SCE may be exaggerating its claims for the 
Pil-400 


Dreamcast Pill-500 


Peak GFLO Ps 
Transformations (peak) 
Transformations (real) 
Bezier Surface Patches 
3D-Rendering Performance 
Pixel Processing Engines 16 1 
Simple Polygons (peak) 75 Mpolys/s 0.7 M polys/s 
+ Z-Buffering & Alpha 2400 M pixels/s Simulated 
+ Z, Alpha, & Texture 1200 M pixels/s | 34 Mpixels/s 


6.2 GFLOPS 
66 M poly/s 
36 Mpolys/s 
16 M polys/s 


Integer only 

2 Mpolys/s n/a 

0.5 M polys/s 
None 


0.02 GFLOPS 


0.2 M polys/s 
None 


1.4 GFLOPS 

4 Mpoly/s 

2 M polys/s 
None 


n/a 

3 M polys/s 
n/a 
n/a 


0.4 GFLOPS 
1.7 M polys/s 
0.8 M polys/s 

Slow 
Voodoo3 
2 

8 M polys/s 
183 M pixels/s 
183 M pixels/s 


2 GFLOPS 

4 Mpolys/s 

2 Mpolys/s 
Slow 


2 
n/a 
350 M pixels/s 
350 M pixels/s 


26 GFLOPS 
44 Mpolys/s 
30 M polys/s 

None 


1 
n/a 
125 M pixels/s 
125 M pixels/s 


Table 2. This table offers an indication of the PSX2’s relative performance, although, since vendors do not adhere to any standard for mea- 
suring 3D-rendering rates, the numbers are not always directly comparable. The original PlayStation is integer only and offers only direc- 
tional lighting, with no support for perspective, Z buffering, or bilinear filtering. Its polygon rates are somewhat higher than N64's, but N64 
supports more rendering features, allowing fewer polygons to be used. N64 uses floating point for physics calculations only. The PSX2 will 
provide full-featured 3D rendering at higher performance than any other game console or PC. Only a high-end 3Dfx Gamma 3 with mul- 
tiple R4 rendering chips (see M PR 11/16/98, p. 20) comes even close. n/a = not available. (Source: M DR estimates based on vendor data) 
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Figure 6. The Emotion Engine, heart of Sony’s second-generation 
PlayStation, implements 10.5 million transistors and measures 
17 x 14.1 mm in a 0.25-micron four-layer-metal 1.8-V process 
with 0.18-micron gates. (Source: SCE and Toshiba) 


PSX2 as well, it, unlike the original PlayStation, is a serious 
piece of 3D hardware that seems to have the horsepower and 
bandwidth to back up SCE’s claims. We see no reason to 
expect SCE’s inflation factor to be any higher than that tradi- 
tionally used by PC and workstation 3D vendors. 


Workstation-Class 3D at Consumer Prices? 
Without more complete information on how the PSX2’s per- 
formance numbers were derived, their implications are im- 
possible to evaluate. If the results turn out to have been 
obtained under ideal conditions (eg., repeatedly calculating 
the same polygon), then the numbers, impressive as they are, 
indicate little about the platform's ultimate performance. But 
if the advertised rates can be sustained while forming real 3D 
objects, which SCE claims they can, the machine will indeed 
offer 3D performance well in excess of all current game con- 
soles, high-end PCs, and even many graphics workstations. 

On the basis of parallelism, bandwidth, and computa- 
tional horsepower, we expect the PSX2 may indeed deliver this 
level of performance. What is less clear is whether SCE can 
deliver the chips at costs suitable for a consumer game con- 
sole. The PlayStation debuted at $299; it and the Nintendo 64 
are both now selling for $129, and Dreamcast is expected to 
sell for $199. Considering PSX2’s additional DVD function 
and its dramatically higher 3D performance, the PSX2 should 
command a premium. Indeed, if SCE delivers the PSX2 at 
$299, life will become exceedingly miserable for Sega. 

But unless SCE and Toshiba know something about 
manufacturing large die that the rest of the industry doesn't, 
$299 is out of reach— at least initially. With 10.5 million tran- 
sistors on 240 mm? of 0.25-micron silicon (as Figure 6 shows), 
the EE alone could cost $130 to build, according to the MDR 
Cost Model. With 42.7 million transistors on 279 mm? (as 
Figure 7 shows), the GS will also cost about $130 (assuming 
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Figure 7. With 4M of multiported DRAM and 16 pixel processors, 
the 42.7-million-transistor graphics synthesizer is 16.7 x 16.7 mm 
in a 0.25-micron process with 0.25-micron gates. (Source: SCE) 


DRAM redundancy is used to improve yield). Along with the 
costs of the other logic chips, the DRDRAMs, afan to cool the 
15-W Emotion Engine, and a DVD-ROM drive, the bill of 
materials will easily exceed $299. Although game-console 
manufacturers do not expect much margin on the console 
(profits come from games), we still expect SCE to sell the 
game console initially for between $400 and $500. 

At these costs, the chips were apparently designed in 
anticipation of process shrinks; presumably, SCE is prepared 
to suffer in the meantime. We expect the chips will be shrunk 
to a full 0.18-micron process before high volume is reached, 
lowering costs into the $65 range— still high, but possibly 
enabling a $299 system price. Further shrinks will be needed 
to bring costs in line with the current price of game consoles. 

Surprisingly, SCE has no intention of taking advantage 
of the higher frequencies that come with shrinks— heresy for 
PC processor vendors. But SCE’s view is that the game mar- 
ket— unlike the PC market— is entirely a software business; 
changes that could disrupt strict compatibility are unthink- 
able. In the consumer business, stability is what's paramount. 
Headroom gained from shrinks will go toward yield improve 
ment (cost reduction), not performance. 


Two Chips, Two Fabs 

To build the EE, SCE is investing $400 million in a joint ven- 
ture with Toshiba to add new production lines to existing 
Toshiba clean-room facilities in Oita, Japan. These facilities 
will enter production this fall, with a planned capacity of 
2,300 200-mm wafers per week. The companies plan to ramp 
production to two million chips per month—a feat that will 
requirea shrink to 0.18 micron, according to our yield modal. 
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In addition to the joint-venture facility with Toshiba, 
SCE itself is building another completely new fab in Nagasaki, 
to produce the GS. At a cost of nearly $600 million, the new 
fab will have a capacity similar to that of the Toshiba facility 
and will enter production in the spring of 2000. In the mean- 
time, the GS will be built in Sony's existing plant in Kokubu. 


Dreamcast Reeling, But Not out Yet 

A high price for PSX2 might give Sega some breathing room. 
Dreamcast is already available in Japan, and it will appear in 
the U.S. with more than 30 game titles by this Christmas. 
PSX2 hardware is nearly a year behind, and with develop- 
ment kits not yet broadly available, game developers will have 
to hustle to get titles ready by Christmas 2000— especially 
ones that take advantage of PSX 2's new capabilities. 

With a full year unopposed as the only next-generation 
console, and with the prospect of high prices for PSX2 loom- 
ing, Dreamcast may yet survive. Still, PSX2 must bea night- 
mare for Sega executives, who had probably not anticipated 
such an aggressive play by SCE. Nintendo, whose next- 
generation Nintendo 2000 game console is still in develop- 
ment, was probably sent scurrying back to the drawing board 
by the strength of SCE’s announcement. 

It will be hard for competitors to match SCE’s and 
Toshiba's manufacturing, PSX 2’s performance, and Play- 
Station’s market momentum. Another important feature that 
competitors have no hope of matching is the PSX2’s back- 
ward compatibility with a huge installed base of games. 
Dreamcast has already abandoned compatibility with Saturn, 
and Nintendo is saddled with ROM -based game cartridges— 
a technology that will not scale to the next generation. 

Although the PlayStation 2000's two main chips are 
large, the level of performance they attain is nothing short of 
remarkable. This achievement is due to several factors, all of 
which stand in stark contrast to the direction in which Intel 
and other PC-processor vendors are moving. For one thing, 
SCE spent no effort or silicon to improve the performance of 
legacy code. For another thing, it wasted littleon instruction- 
level parallelism. T his decision is well advised, as the interest- 
ing parallelism in multimedia applications is data- and task- 
level parallelism, which are more efficiently exploited with 
SIM D, vector, and multiprocessor architectures. 

Also, SCE spent no silicon on features like double- or 
extended- precision floating point, which add to cost but not 
to 3D performance. As a result of these decisions, SCE was 
able to create a chip with 10 single precision FP multiplier- 
accumulators and four FP dividers— all of which contribute 
directly to 3D performance 

But PSX 2's bleeding- edge technology has downsides. 
Onethat SCE must overcomeis Direct RambusDRAM s. Ken 
Kutaragi, CEO of Sony Computer Entertainment— widely 
known as the father of PlayStation— has publicly expressed 
his concern over the current state of DRDRAMs. But there 
aren't many alternatives: PC-133 SDRAM does not provide 
theneeded bandwidth, and DDR SDRAM sarenot yet stable. 
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Possibly the largest technical hurdle facing SCE is 
enabling developers to take advantage of the EE's capa- 
bilities. Programming parallel machines is notoriously dif- 
ficult. As the EE’s superscalar-SIM D-vector-VLIW-M P 
structure raises parallelism to new heights, so too will it 
raise programming difficulty. SCE’s only hope is that a 
rich set of libraries and a good software-development 
environment will make the problem tractable. 

SCE has captured the imagination of game develop- 
ers with early demonstrations of PSX2 hardware. To 
show off the platform’s physics- modeling capability, for 
example, the company used a puff-ball demo in which 
the wind affects each strand of the puff-ball individually. 

Big-name game developers Namco (www.namco.com) 
and Square (www.square.co.jp) have already presented 
technology demonstrations on PSX2 hardware. The plat- 
form provides exactly the features that Square needs for 
Final Fantasy VIII's finely rendered movie sequences. 
Anecdotal evidence so far suggests that simple ports of 
PlayStation games to PSX2 will be relatively easy and will 
get a big boost in realism. Rewriting to exploit PSX2’s 
physics-modeling and rendering capabilities, however, 
will be more difficult, but also more rewarding. 

SCE has already enlisted Animation Science (www. 
anisci.com) to develop middleware for modeling natural 
phenomena such as wind, rain, and snow. The company 
will port its O utburst particle animation and Rampage 
crowd-scene animation software to the PSX2. M ath- 
Engine (www.mathengine.com) is also slated to provide 
software to model multibody dynamics and the behavior 
of fluids and deformable objects (e.g., bouncing rubber 
balls). It will also provide an artificial-intelligence layer to 
simulate running and jumping characters. 


The aggressiveness of SCE’s platform may signal the 
company’s intention to move upscale from current game con- 
soles, cutting a wider swath through the living room. Cer- 
tainly, Internet access and DVD movies are within the scope of 
PSX2. And with USB, IEEE-1394, and PC Card interfaces, SCE 
has enabled other functions to be roped in. The new platform 
could even pose a threat to Wintel, as PSX2 can perform many 
of the functions of ahome PC, given appropriate software. If, 
as some believe, the technology battles of the future will be 
waged over entertainment applications as opposed to business 
applications, PSX2 definitely improves Sony’s position. 

For now, SCE’s sights are focused on the video-game 
market. Technically, the PlayStation 2000 is so far ahead of 
everything else that exists— or is likely to— that SCE isn’t 
concerned about other game consoles. Instead, the company 
takes a broader view of competition; to it, the competition is 
everything else that competes for people's time. WW 
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